Disambiguation with Feature Selection and Semi - Supervised Learning ”

نویسنده

  • Akira Shimazu
چکیده

1. Objective Word Sense Disambiguation (WSD) is the task of determining the right sense of a polysemous word in a given context. This study aims to enhance the performance of supervised-based word sense determination by focusing on feature selection and using bootstrapping techniques. Senses determination of a word is essentially based on the information extracted from the context in which this word appears. The information is represented by a set of features: unordered words, ordered words, collocations (sequences of words including the target word), and grammatical constituents. With an observation that: some of these features may be redundant and various ways of combining these features will cause different results of the WSD task, we focused our first work on feature selection. Another problem is that of lacking labeled data for supervised methods. While labeled data is scarce or expensive, an abundance of unlabeled data is available but it is not be utilized by conventional supervised methods. Therefore the second problem in my study is utilizing unlabeled data in WSD by using semi-supervised learning algorithms (or bootstrapping algorithms).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theme: A Study of Classifier Combination and Semi-Supervised Learning for Word Sense Disambiguation

1. Aims Word Sense Disambiguation (WSD) involves the association of a polysemous word in a text or discourse with a particular sense among numerous potential senses of that word. In my thesis, we present a study of classifier combination and semi-supervised learning for WSD, which aim to boost supervised WSD and improve accuracy of WSD. In addition, we also work on context representation and fe...

متن کامل

A Semi-Supervised Feature Clustering Algorithm with Application to Word Sense Disambiguation

In this paper we investigate an application of feature clustering for word sense disambiguation, and propose a semisupervised feature clustering algorithm. Compared with other feature clustering methods (ex. supervised feature clustering), it can infer the distribution of class labels over (unseen) features unavailable in training data (labeled data) by the use of the distribution of class labe...

متن کامل

Semi-Supervised Learning for Word Sense Disambiguation: Quality vs. Quantity

In this paper, we discuss the importance of the quality against the quantity of automatically extracted examples for word sense disambiguation (WSD). We first show that we can build a competitive WSD system with a memory-based classifier and a feature set reduced to easily and efficiently computable features. We then show that adding automatically annotated examples improves the performance of ...

متن کامل

Graph Laplacian for Semi-supervised Feature Selection in Regression Problems

Feature selection is fundamental in many data mining or machine learning applications. Most of the algorithms proposed for this task make the assumption that the data are either supervised or unsupervised, while in practice supervised and unsupervised samples are often simultaneously available. Semi-supervised feature selection is thus needed, and has been studied quite intensively these past f...

متن کامل

Word Sense Disambiguation by Semi-supervised Learning

In this paper we propose to use a semi-supervised learning algorithm to deal with word sense disambiguation problem. We evaluated a semi-supervised learning algorithm, local and global consistency algorithm, on widely used benchmark corpus for word sense disambiguation. This algorithm yields encouraging experimental results. It achieves better performance than orthodox supervised learning algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001